Near-Optimal Coresets of Kernel Density Estimates

نویسندگان

  • Jeff M. Phillips
  • Wai Ming Tai
چکیده

We construct near-optimal coresets for kernel density estimate for points in Rd when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size O( √ d log(1/ε)/ε), and we show a near-matching lower bound of size Ω( √ d/ε). The upper bound is a polynomial in 1/ε improvement when d ∈ [3, 1/ε2) (for all kernels except the Gaussian kernel which had a previous upper bound of O((1/ε) log(1/ε))) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative. 1998 ACM Subject Classification I.3.5 Computational Geometry and Object Modeling

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Coresets for Kernel Density Estimates

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the L∞ error between kernel density estimates within error ε, with cor...

متن کامل

Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampl...

متن کامل

Diversity Maximization via Composable Coresets

Given a set S of points in a metric space, and a diversity measure div(·) defined over subsets of S, the goal of the diversity maximization problem is to find a subset T ⊆ S of size k that maximizes div(T ). Motivated by applications in massive data processing, we consider the composable coreset framework in which a coreset for a diversity measure is called α-composable, if for any collection o...

متن کامل

eps-Kernel Coresets for Stochastic Points

7 We study the problem of constructing ε-kernel coresets for uncertain points. We consider uncertainty8under the existential model where each point’s location is fixed but only occurs with a certain probability,9and the locational model where each point has a probability distribution describing its location. An ε-10kernel coreset approximates the width of a point set...

متن کامل

Boundary Adjusted Density Estimation and Bandwidth Selection

This paper studies boundary effects of the kernel density estimation and proposes some remedies to the problems. Since the kernel estimate is designed for estimating a smooth density, it introduces a large bias near the boundaries where the density is discontinuous. Bandwidth selectors developed for the kernel estimate that select a small bandwidth to reduce the bias can dramatically increase t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.01751  شماره 

صفحات  -

تاریخ انتشار 2018